Add statistical methods #33
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR
Notes
Statistical methods are widely implemented across dataframe libraries and are used by downstream libraries.
Series
not included based on previous consortium discussions where dataframe/series distinction not considered necessary. See Avoiding the "pandas trap" #4 and Separate object for a dataframe colum? (is Series needed?) #6.vaex and ibis have considerably different APIs than pandas, Dask, Modin, cuDF, and Koalas, and only influenced API inclusion based on whether the libraries provided a particular method name (or equivalent), but not keyword arguments.
Comments for each proposed method:
skipna
. Both cuDF and Koalas supportskipna
, but notaxis
(pandas, Dask, Modin).cummax
.cummax
.cummax
.axis
. pandas, Modin, and Koalas supportnumeric_only
, but others do not.max
.max
.nlargest
.max
. Koalas can only support positive numbers due to implementation algorithm. pandas, Dask, Modin, and cuDF support amin_count
keyword argument, but Koalas does not.max
. Koalas does not support a correction factor. Similar to the array API specification, renamedddof
tocorrection
, as this is a historical "bug" carried over from NumPy.max
. pandas, Dask, Modin, and cuDF support amin_count
keyword argument, but Koalas does not.std
.methods excluded from this initial proposal:
mode
,median
,nunique
, andquantile
due to either lack of universal availability, divergent behavior, increased complexity, or lack of downstream usage. These can be considered in a future PR.